The Narrow Waist of the Internet
Learn about the layered architecture of the internet to better understand how APIs work.
Introduction#
APIs send requests and receive responses over a network. The internet opts for a layered approach to its architecture to manage the complexity of sending data across two entities, which could potentially be in two far-away parts of the world. Initially, the internet standardized the IP as a middleware around which all the innovation happens. We could make any application hoping that the IP would relay its data, and any new communication technology could flourish underneath the IP layer, provided it supports IP. However, many hiccups have happened over the years in the attempt to achieve that goal. Today, most public-facing API calls use HTTP. Let’s suppose we use a raw IP packet with our custom transmission protocol, something other than Transmission Control Protocol (TCP) or User Datagram Protocol (UDP). In that case, it probably won’t make it far, because most enterprises use firewalls that only allow a handful of well-known applications, and drop everything else. (Actually, such a packet might not pass through our ISP router!) Because this is a course on API design, our purpose here is to answer the following question from the first principles of networking:
Why does a large proportion of API calls converge to use REST over HTTPS?
We’ll first refresh the concepts of the layered architecture of the internet before describing insights to answer the question above.
Layered architecture#
Let's assume that we have
Provides modularity so that each layer performs its own operations.
Provides ease in troubleshooting the network.
Modification in one layer has no impact on the other.
Flexibly accommodates different types of applications.
A layered approach divides the large task into smaller tasks, making it easy to solve the bigger problem. To communicate between any two systems sharing a network (for instance, client-server communication), the request is communicated from one layer to the next, where each layer performs its set of tasks.
The interface (in the illustration above) is used to create the link between the layers and determine which services are available to the upper layer by the lower layer. Each layer has at least one protocol that determines the following information:
The order of messages over the network.
The actions that we perform on the message transmission.
The structure of the messages.
The message's receipt.
Error detection and correction.
Reliability: The protocol should be able to detect errors and recover from them.
Resource allocation: The network uses the resources to provide the services to the systems that want communication. The protocol manages the allocation and deallocation of resources so that a new host doesn’t have to wait.
Scalability: The network is growing fast, and congestion on the network may arise. Protocols should have the ability to adapt to such changes and be able to scale.
Security: The protocol can save the network from threats, authenticate the user’s access to the data, and restrict unauthorized message manipulation.
Layered models#
Billions of users are utilizing the internet all over the world. We have to understand the standard systems used worldwide to ensure communication and interoperability. Let's discuss the models that were followed to structure the internet as we know it today:
The OSI model#
Open system interconnection (OSI) consists of seven layers, each with a header containing information about the protocol in use, addresses, sequence number, and so on. Let's understand the OSI model with the example of email, shown in the slides below:
1 of 14
2 of 14
3 of 14
4 of 14
5 of 14
6 of 14
7 of 14
8 of 14
9 of 14
10 of 14
11 of 14
12 of 14
13 of 14
14 of 14
Application layer: The sender interacts with the application layer, and it includes different protocols, for example, HTTP and SMTP protocols.
Presentation layer: t takes care of the data's format on the sender and receiver sides. It ensures the data is understandable at the receiver end. Also, it may perform encryption, decryption, and compression of the data.
Session layer: It creates, manages, and terminates the sessions between end-to-end devices. It’s also responsible for authentication and reconnections.
Transport layer: It’s responsible for data ordering, reliability, and error checking.
Note: The application, presentation, session, and transport layers are only present on the end hosts, and not on the routers (the intermediate nodes that only use the network, datalink, and physical layers).
Network layer: IP addressing and routing is the responsibility of the network layer. A node connected to the internet needs to have an IP address for communication with peers.
Data link layer: This layer is responsible for frame transmission, address for local area network (LAN), and logical link control. In OSI models, the data unit is considered to be a packet in the network layer and a frame in the data link layer.
Physical layer: It handles the transmission of bitstreams on the physical link between the sender and the immediate next hop receiver. The physical link can be Ethernet, DSL, and so forth.
The OSI model is always considered as a reference model and is not implemented because of many technical and nontechnical reasons. At first, nobody understood what session and presentation layers do, because these layers were more specific to the applications, not the network. These two layers could have been merged into the application layer (something that the TCP/IP model did).
Note: Historically, TCP/IP was introduced prior to the OSI model and does not contain the session, presentation, and physical layer. We can think of this change in the following way: the concerns of the session and application layer are the responsibility of the application layer, and the data link layer also takes care of the concerns of the physical layer. The contribution of the OSI model was a clear separation of concerns into different layers and their service interfaces. OSI was lacking in terms of protocols. That is where the TCP/IP model came in.
The TCP/IP model#
The Transmission Control Protocol/Internet Protocol (TCP/IP) model is mainly based on internet protocols (IP and TCP). The collection of protocols creates an hourglass shape, and the internet protocol uses the IP layer as its narrow waist, where the protocols above and underneath can flourish independently. We can add new protocols related to the application, transport, and network layers of the architecture according to the application's requirements.
Each layer performs particular services and communicates with its upper and lower layers. The physical layer and data link layer are combined in the network access layer. It consists of multiple network technologies such as Ethernet. The internet layer is responsible for logical data transmission over the network. Next, the transport layer determines the application using the port number and sends the data to the relevant application on the remote peer. In the application layer, we have multiple protocols—such as HTTP, SMTP, FTP, and so on—to interpret and display the data on the screen. The TCP/IP model is the open-protocol suite, and anyone can use it.
The narrow waist of the Internet#
The Internet Protocol (IP) has mainly dominated the network layer. This is especially because it maximizes interoperability, makes addressing universal, and provides best-effort service. Due to this reason, the internet was said to have a narrow waist architecture because the IP extends its support to many upper-layer transport and application protocols and many bottom-layer communication technologies. The concept is depicted in the illustration below:
However, the Internet's progression in multiple aspects (such as real-time responsiveness, security, and so on) coupled with the explosive growth of HTTP-based applications led to the logical conclusion that HTTP is the new narrow waist of the Internet.
Some of the reasons for this insight are as follows:
Existing infrastructure like proxies, caches, and content delivery networks (CDNs) facilitate HTTP evolution.
HTTP easily penetrates enterprise firewalls, making it the obvious choice for nearly any application.
The evolution of HTTP, especially for streaming applications.
Although the web was dominated by HTTP traffic in the 1990s, HTTP-based applications also increased in the early 2000s due to the overtake of REST architecture. However, an increasing number of applications across the web depended on streaming media, which posed a challenge for HTTP. But in the past decade, HTTP overcame this limitation by introducing HTTP streaming.
With the usage of APIs over the web, businesses depend on HTTP to exchange data, make development faster, and add functionalities in little to no time, without rewriting code. With such a level of reliance on HTTP, it’s safe to deduce that the new narrow waist of the Internet is HTTP. This means that the majority of traffic on the web runs over the HTTP protocol.
HTTP is super important for the continued growth of businesses due to its content-centric and versatile nature. For data transfer between companies, HTTP is used as the engine to run REST APIs. Consequently, the usage of HTTP over the years will only intensify. Nonetheless, HTTP requires transmission (TCP) and security protocols (TLS) for safe communication. This means that the narrow waist of the Internet can be expanded to include IP, TCP, TLS, and HTTP.
Summary#
This lesson teaches us many things. Probably the most important thing is that once a system becomes operational, it evolves in ways not originally perceived by its designers. The original internet designers wanted the IP layer to be a thin middleware enabling independent innovations above and below the IP layer. Though today, many public-facing APIs tunnel through HTTP. This is due to two reasons. First, because HTTP is one of the few applications enterprises allow on their network, and second, due to a complementary ecosystem that has evolved around HTTP over time.
Quiz#
Which protocol is the new narrow waist of the modern Internet?
File Transfer Protocol (FTP)
HyperText Transfer Protocol (HTTP)
This is due to the level of reliance by the modern internet on HTTP.
Internet Protocol (IP)
None of the above
Business Considerations with APIs
Latency and Throughput